10.9 s

Algorithms in Neuroscience

The goal of this lecture is to learn how to use computational principals to understand natural phenomena.

Biological systems, of all scales, have evolved to solve the challenges posed by their environments inorder to give themselves the best chance of survival and reproduction. Algorithms are processes or sets of rules used to solve classes of problems, which makes them a powerful tool to systemitize and describe these natural strategies.

8.1 μs

Marr's Levels of Analysis

  1. Computational: what does the system do, and why does it do these things

  2. Algorithmic/representational level: how does the system do what it does

  3. Implementational/physical level: how is the system physically realized

4.1 μs

3.2 μs

Reinforcement Learning

Reinforcement Learning (RL) is the process by which an agent learns to act in its environment such that it maximizes its cummulative reward.

Understanding RL is a fundamental challenge for neuroscience because it underpins how the brain learns to behave!

Challenges Facing RL

  1. Curse of Dimensionality i.e. the world is way too complex brute force for trial and error

  2. Temporal Credit Assignment

  3. Imperfect Information

  4. Exploration-Exploitation Dilemma

  5. ...

8.6 μs

Temporal Credit Assignment

Using Marr's framework what Computational problem must RL solve?

An agent must bind rewards to temporally distal prior states and actions.

We must introduce some symbols to describe this problem algebraically

Let St be the state of the agent and environment at time t. Let π be a policy, or a set of state-action pairs, rt be the reward at St, and λ be the discount rate. Vπ(St) is the value of the state St under the policy π.

Vπ(St)=t=0λtrt+t

That is the value of the current state is the discounted sum of all the expected future rewards.

To derive an Algorithm to learn Vπ(St) we start by splitting the sum.

Vπ(St)=rt+t=0λ(t+1)+tr(t+1)+t

We can now recursively define the function as

Vπ(St)=rt+Vπ(St+1)

This recursive formulation gives us the simple learning rule

Vπ(St)=Vπ(St)+α[rt+λVπ(St+1)-Vπ(St)]

where α is the learning rate and [rt+λVπ(St+1)-Vπ(St)] is the reward prediction error.

This Temporal-Difference (TD) Learning!

12.7 μs

TD Learning in the Brain

How might TD learning be implemented in the brain?

Dopaminergic activity in the SNc and VTA has been shown to indicate reward prediction error which could provide a substrate for neural TD learning.

5.6 μs

2.7 μs

Exploration-Exploitation Dilemma

To find the most optimal policy, agents must thoroughly search the space of possible states. However, practical contstraints require the agent to balance this exploration against prioritize known, high value states.

A common solution is to randomly sample states in porportion to their known value. One of the simplest sampling algorithms is called ε-greedy. This algorithm samples the highest value state with probability 1-ε and samples a random state with probability ε.

6.4 μs

TD Learning Example

Below is an implementation of TD learning to find the shortest path between two nodes on a graph. The Agent begins at the Start node, marked in red, and travels along the edges, from node to node, until it reaches the Terminal node, marked in blue, and recieves a reward. After each move the Agent updates its estimates for the value of each node which corresponds to how many moves it is away from the reward.

3.7 μs

Graph Size: 16 Sparsity: 0.2

128 ms
6.5 μs

Start: End:

86.5 μs
27.1 ms

lamda: 0.76 alpha: 0.2 epsilon: 0.1

11.2 ms
3.7 μs
26.6 ms

Run Sim: Iterations:

11.1 ms
40.6 s

Defines Graph Maze

4.6 μs
plot (generic function with 1 method)
1.3 ms

Defines Agent

8.9 μs
reset! (generic function with 2 methods)
1.2 ms

Implements ε-greedy algorithm

5.7 μs
choose_next_move (generic function with 1 method)
47.4 μs

Implements TD Learning

4.5 μs
update! (generic function with 1 method)
64.4 μs
step! (generic function with 1 method)
21.9 μs
plot (generic function with 2 methods)
76.4 μs
run! (generic function with 2 methods)
1.4 ms
gen_adj_mat (generic function with 1 method)
20.5 μs
remove_self_loops! (generic function with 1 method)
27.7 μs